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Abstract 



We describe a new paradigm for implementing inference in belief networks, which con- 
sists of two steps: (1) compiling a belief network into an arithmetic expression called a 
Query DAG (Q-DAG); and (2) answering queries using a simple evaluation algorithm. 
Each node of a Q-DAG represents a numeric operation, a number, or a symbol for ev- 
idence. Each leaf node of a Q-DAG represents the answer to a network query, that is, 
the probability of some event of interest. It appears that Q-DAGs can be generated us- 
ing any of the standard algorithms for exact inference in belief networks — we show how 
they can be generated using clustering and conditioning algorithms. The time and space 
complexity of a Q-DAG generation algorithm is no worse than the time complexity of the 
inference algorithm on which it is based. The complexity of a Q-DAG evaluation algorithm 
is linear in the size of the Q-DAG, and such inference amounts to a standard evaluation of 
the arithmetic expression it represents. The intended value of Q-DAGs is in reducing the 
software and hardware resources required to utilize belief networks in on-line, real- world 
applications. The proposed framework also facilitates the development of on-line inference 
on different software and hardware platforms due to the simplicity of the Q-DAG evaluation 
algorithm. Interestingly enough, Q-DAGs were found to serve other purposes: simple tech- 
niques for reducing Q-DAGs tend to subsume relatively complex optimization techniques 
for belief-network inference, such as network-pruning and computation-caching. 

1. Introduction 

Consider designing a car to have a self-diagnostic system that can alert the driver to a range 
of problems. Figure 1 shows a simplistic belief network that could provide a ranked set 
of diagnoses for car troubleshooting, given input from sensors hooked up to the battery, 
alternator, fuel-tank and oil-system. 

The standard approach to building such a diagnostic system is to put this belief network, 
along with inference code, onto the car's computer; see Figure 2. We have encountered a 
number of difficulties when using this approach to embody belief network technology in in- 
dustrial applications. First, we were asked to provide the technology on multiple platforms. 
For some applications, the technology had to be implemented in ADA to pass certain certi- 
fication procedures. In others, it had to be implemented on domain-specific hardware that 
only supports very primitive programming languages. Second, memory was limited to keep 
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Figure 1: A simple belief network for car diagnosis. 



the cost of a unit below a certain threshold to maintain product profitability. The dilemma 
was the following: belief network algorithms are not trivial to implement, especially when op- 
timization is crucial, and porting these algorithms to multiple platforms and languages would 
have been prohibitively expensive, time-consuming and demanding of qualified manpower. 

To overcome these difficulties, we have devised a very flexible approach for implementing 
belief network systems, which is based on the following observation. Almost all the work 
performed by standard algorithms for belief networks is independent of the specific evidence 
gathered about variables. For example, if we run an algorithm with the battery-sensor set 
to low and then run it later with the variable set to dead, we find almost no algorithmic 
difference between the two runs. That is, the algorithm will not branch differently on any 
of the key decisions it makes, and the only difference between the two runs is the specific 
arguments to the invoked numeric operations. Therefore, one can apply a standard inference 
algorithm on a network with evidence being a parameter instead of being a specific value. The 
result returned by the algorithm will then be an arithmetic expression with some parameters 
that depend on specific evidence. This parameterized expression is what we call a Query 
DAG, an example of which is shown in Figure 4. 1 

The approach we are proposing consists of two steps. First, given a belief network, a set 
of variables about which evidence may be collected (evidence variables), and a set of vari- 
ables for which we need to compute probability distributions (query variables), a Q-DAG 
is compiled off-line, as shown in Figure 3. The compilation is typically done on a sophisti- 
cated software/hardware platform, using a traditional belief network inference algorithm in 
conjunction with the Q-DAG compilation method. This part of the process is far and away 
the most costly computationally. Second, an on-line system composed from the generated 
Q-DAG and an evaluator specific to the given platform is used to evaluate the Q-DAG. Given 
evidence, the parameterized arithmetic expression is evaluated in a straightforward manner 
using simple arithmetic operations rather than complicated belief network inference. The 



1. The sharing of subexpressions is what makes this a Directed Acyclic Graph instead of a tree. 
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Figure 2: This figure compares the traditional approach to exact belief-network inference 
(shown on the left) with our new compiled approach (shown on the right) in the 
context of diagnostic reasoning. In the traditional approach, the belief network 
and sensor values are used on-line to compute the probability distributions over 
fault variables; in the compiled approach, the belief network, fault variables and 
sensor variables are compiled off-line to produce a Q-DAG, which is then evaluated 
on-line using sensor values to compute the required distributions. 



computational work needed to perform this on-line evaluation is so straightforward that it 
lends itself to easy implementations on different software and hardware platforms. 

This approach shares some commonality with other methods that symbolically manip- 
ulate probability expressions, like SPI (Li & D'Ambrosio, 1994; Shachter, D'Ambrosio, & 
del Favero, 1990); it differs from SPI on the objective of such manipulations and, hence, 
on the results obtained. SPI explicates the notion of an arithmetic expression to state that 
belief-network inference can be viewed as an expression-factoring operation. This allows 
results from optimization theory to be utilized in belief-network inference. On the other 
hand, we define an arithmetic expression to explicate and formalize the boundaries between 
on-line and off-line inference, with the goal of identifying the minimal piece of software that 
is required on-line. Our results are therefore oriented towards this purpose and they include: 
(a) a formal definition of a Q-DAG and its evaluator; (b) a method for generating Q-DAGs 
using standard inference algorithms — an algorithm need not subscribe to the inference-as- 
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Figure 3: The proposed framework for implementing belief-network inference. 
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Figure 4: A belief network (a); and its corresponding Query-DAG (b). Here, C is an evidence 
variable, and we are interested in the probability of variable B. 



factoring view to be used for Q-DAG generation; and (c) computational guarantees on the 
size of Q-DAGs in terms of the computational guarantees of the inference algorithm used 
to generate them. Although the SPI framework is positioned to formulate related results, it 
has not been pursued in this direction. 

It is important to stress the following properties of the proposed approach. First, declar- 
ing an evidence variable in the compilation process does not mean that evidence must be 
collected about that variable on-line — this is important because some evidence values, e.g., 
from sensors, may be lost in practice — it only means that evidence may be collected. There- 
fore, one can declare all variables to be evidence if one wishes. Second, a variable can be 
declared to be both evidence and query. This allows one to perform value-of-information 
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computations to decide whether it is worth collecting evidence about a specific variable. 
Third, the space complexity of a Q-DAG in terms of the number of evidence variables is no 
worse than the time complexity of its underlying inference algorithm; therefore, this is not 
a simple enumerate- all-possible-cases approach. Finally, the time and space complexity for 
generating a Q-DAG is no worse than the time complexity of the standard belief-network 
algorithm used in its generation. Therefore, if a network can be solved using a standard 
inference algorithm, and if the time complexity of this algorithm is no worse than its space 
complexity, 2 then we can construct a Q-DAG for that network. 

The following section explains the concept of a Q-DAG with a concrete example and 
provides formal definitions. Section 3 is dedicated to the generation of Q-DAGs and their 
computational complexity, showing that any standard belief-network inference algorithm 
can be used to compile a Q-DAG as long as it meets some general conditions. Section 4 
discusses the reduction of a Q-DAG after it has been generated, showing that such reduction 
subsumes key optimizations that are typically implemented in belief network algorithms. 
Section 5 contains a detailed example on the application of this framework to diagnostic 
reasoning. Finally, Section 6 closes with some concluding remarks. 

2. Query DAGs 

This section starts our treatment of Q-DAGs with a concrete example. We will consider a 
particular belief network, define a set of queries of interest, and then show a Q-DAG that 
can be used to answer such queries. We will not discuss how the Q-DAG is generated; only 
how it can be used. This will allow a concrete introduction to Q-DAGs and will help us 
ground some of the formal definitions to follow. 

The belief network we will consider is the one in Figure 4(a). The class of queries we 
are interested in is Pr(B \ C), that is, the probability that variable B takes some value 
given some known (or unknown) value of C. Figure 4(b) depicts a Q-DAG for answering 
such queries, which is essentially a parameterized arithmetic expression where the values of 
parameters depend on the evidence obtained. This Q-DAG will actually answer queries of 
the form Pr(B,C), but we can use normalization to compute Pr(B \ C). 

First, a number of observations about the Q-DAG in Figure 4(b): 

• The Q-DAG has two leaf nodes labeled Pr(B=ON, c) and Pr(B=OFF , c). These are 
called query nodes because their values represent answers to the queries Pr(B=ON , c) 
and Pr(B=OFF,c). 

• The Q-DAG has two root nodes labeled (C, ON) and (C, OFF). These are called 
Evidence Specific Nodes (ESNs) since their values depend on the evidence collected 
about variable C on-line. 

According to the semantics of Q-DAGs, the value of node (V, v) is 1 if variable V is 
observed to be v or is unknown, and otherwise. Once the values of ESNs are determined, 
we evaluate the remaining nodes of a Q-DAG using numeric multiplication and addition. 
The numbers that get assigned to query nodes as a result of this evaluation are the answers 
to queries represented by these nodes. 

2. Algorithms based on join trees have this property. 
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Figure 5: Evaluating the Q-DAG in Figure 4 with respect to two pieces of evidence: (a) 
C=ON and (b) C=OFF. 



For example, suppose that the evidence we have is C = ON. Then ESN (C, ON) is 
evaluated to 1 and ESN (C, OFF) is evaluated to 0. The Q-DAG in Figure 4(b) is then 
evaluated as given in Figure 5(a), thus leading to 

Pr{B=ON ,C=ON) = .3475, 

and 

Pr(B=OFF,C=ON) = .2725, 

from which we conclude that Pr(C = ON) = .62. We can then compute the conditional 
probabilities Pr(B=ON | C=ON) and Pr(B=OFF | C=ON) using: 

Pr(B=ON | C=ON) = Pr(B=ON,C=ON)/Pr(C=ON), 

Pr(B=OFF | C=ON) = Pr(B=OFF ,C=ON) / Pr(C=ON). 

If the evidence we have is C=OFF , however, then (C, OA) evaluates to and (C, OFF) 
evaluates to 1. The Q-DAG in Figure 4(b) will then be evaluated as given in Figure 5(b), 
thus leading to 

Pr(B=ON,C=OFF) = .2875, 

and 

Pr(B=OFF,C=OFF) = .0925. 

We will use the following notation for denoting variables and their values. Variables 
are denoted using uppercase letters, such as A,B,C, and variable values are denoted by 
lowercase letters, such as a, b, c. Sets of variables are denoted by boldface uppercase letters, 
such as A, B, C, and their instantiations are denoted by boldface lowercase letters, such as 
a, b,c. We use E to denote the set of variables about which we have evidence. Therefore, 
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we use e to denote an instantiation of these variables that represents evidence. Finally, the 
family of a variable is the set containing the variable and its parents in a directed acyclic 
graph. 

Following is the formal definition of a Q-DAG. 
Definition 1 A Q-DAG is a tuple (V,o,I,V,Z) where 

1. V is a distinguished set of symbols (called evidence variables ) 

2. o is a symbol ( called unknown value) 

3. X maps each variable in V into a set of symbols ( called variable values) different from 
o. 

4- V is a directed acyclic graph where 

- each non-root node is labeled with either + or * 

- each root node is labeled with either 

- a number in [0, 1] or 

- a pair (V, v) where V is an evidence variable and v is a value 

5. Z is a distinguished set of nodes in V (called query nodes) 

Evidence variables V correspond to network variables about which we expect to collect 
evidence on-line. For example, in Figure 5, C is the evidence variable. Each one of these 
variables has a set of possible values that are captured by the function X. For example, in 
Figure 5, the evidence variable C has values ON and OFF . The special value o is used 
when the value of a variable is not known. For example, we may have a sensor variable with 
values "low," "medium," and "high," but then lose the sensor value during on-line reasoning. 
In this case, we set the sensor value to o. 3 Query nodes are those representing answers to 
user queries. For example, in Figure 5, B is the query variable, and leads to query nodes 
Pr(B=ON,c) and Pr(B=OFF,c). 

An important notion is that of evidence: 

Definition 2 For a given Q-DAG (V,o,X,V,Z), evidence is defined as a function 8 that 
maps each variable V in V into the set of values X(V) U {o}. 

When a variable V is mapped into v G X(V), then evidence tells us that V is instantiated to 
value v. When V is mapped into o, then evidence does not tell us anything about the value 
of V. 

We can now state formally how to evaluate a Q-DAG given some evidence. But first we 
need some more notation: 

1. Numeric-Node: n(p) denotes a node labeled with a number p £ [0, 1]; 

2. ESN: n(V, v) denotes a node labeled with (V, v); 

3. This is also useful in cases where a variable will be measured only if its value of information justifies 
that. 



153 



Darwiche & Provan 



3. Operation-Node: n\ © . . . © n 4 - denotes a node labeled with * and having parents 



4. Operation-Node: n\ © . . . © n 4 - denotes a node labeled with + and having parents 



The following definition tells us how to evaluate a Q-DAG by evaluating each of its nodes. 
It is a recursive definition according to which the value assigned to a node is a function of 
the values assigned to its parents. The first two cases are boundary conditions, assigning 
values to root nodes. The last two cases are the recursive ones. 

Definition 3 For a Q-DAG (V, o,I,V, Z) and evidence 8, the node evaluator is defined as 
a function Ai £ that maps each node in V into a number [0, 1] such that: 

1. M £ [n(p)] = p 

(The value of a node labeled with a number is the number itself.) 



(The value of an evidence-specific node depends on the available evidence: it is 1 if v 
is consistent with the evidence and otherwise.) 

3. Me[ni © . . . © n t ] = M £ (n 1 ) * ... * M £ (n t ) 

(The value of a node labeled with * is the product of the values of its parent nodes.) 

4- M £ [n t ©...©»,-] = M £ {n t ) + ... + M £ {n t ) 

(The value of a node labeled with + is the sum of the values of its parent nodes.) 

One is typically not interested in the values of all nodes in a Q-DAG since most of these 
nodes represent intermediate results that are of no interest to the user. It is the query nodes 
of a Q-DAG that represent answers to user queries and it is the values of these nodes that one 
seeks when constructing a Q-DAG. The values of these queries are captured by the notion 
of a Q-DAG output. 

Definition 4 The node evaluator M. £ is extended to Q-DAGs as follows: 



The set Ais((V, o, I, V, Z)) is called the Q-DAG output. 

This output is what one seeks from a Q-DAG. Each element in this output represents a 
probabilistic query and its answer. 

Let us consider a few evaluations of the Q-DAG shown in Figure 4, which are shown in 
Figure 5. Given evidence £(C) = ON, and assuming that Qnode(B=ON) and Qnode(B = 
OFF) stand for the Q-DAG nodes labeled Pr(B=ON,c) and Pr(B=OFF,c), respectively, 



n 1 , . . ., ra 8 ; 



n 1 , . . ., rii. 




M £ {{V,<>,1,V,Z)) = {{n,M £ {n)) \ n £ Z}. 



we have 



M £ [n{C, ON)} 
M £ [n(C, OFF)] 
M £ [Qnode(B=ON)] 
M £ [Qnode(B=OFF)] 







1 



.075 * (.9 * 1 + .1 * 0) + .56 * (1 * .5 + .5 * 0) = .3475 
(.9 * 1 + .1 * 0) * .225 + (1 * .5 + .5 * 0) * .14 = .2725 



154 



A Practical Paradigm for Implementing Belief-Network Inference 



meaning that Pr(B=ON \C=ON) = .3475 and Pr(B=OFF ,C=ON) = .2725. If instead the 
evidence were E (C)=OFF , a set of analogous computations can be done. 

It is also possible that evidence tells us nothing about the value of variable C, that is, 
E(C) = o. In this case, we would have 

M £ [n(C,ON)] = 1, 

M £ [n(C,OFF)] = 1, 

M £ [Qnode{B=ON)} = .075 * (.9 * 1 + .1 * 1) + .56 * (1 * .5 + .5 * 1) = .635, 

M £ [Qnode{B=OFF)} = (.9 * 1 + .1 * 1) * .225 + (1 * .5 + .5 * 1) * .14 = .365, 

meaning that Pr{B=ON) = .635 and Pr{B=OFF) = .365. 



2.1 Implementing a Q-DAG Evaluator 

A Q-DAG evaluator can be implemented using an event-driven, forward propagation scheme. 
Whenever the value of a Q-DAG node changes, one updates the value of its children, and so 
on, until no possible update of values is possible. Another way to implement an evaluator 
is using a backward propagation scheme where one starts from a query node and updates 
its value by updating the values of its parent nodes. The specifics of the application will 
typically determine which method (or combination) will be more appropriate. 

It is important that we stress the level of refinement enjoyed by the Q-DAG propaga- 
tion scheme and the implications of this on the efficiency of query updates. Propagation in 
Q-DAGs is done at the arithmetic-operation level, which is contrasted with propagation at 
the message-operation level (used by many standard algorithms). Such propagation schemes 
are typically optimized by keeping validity flags of messages so that only invalid messages 
are recomputed when new evidence arrives. This will clearly avoid some unnecessary com- 
putations but can never avoid all unnecessary computations because a message is typically 
too coarse for this purpose. For example, if only one entry in a message is invalid, the 
whole message is considered invalid. Recomputing such a message will lead to many un- 
necessary computations. This problem will be avoided in Q-DAG propagation since validity 
flags are attributed to arithmetic operations, which are the building blocks of message oper- 
ations. Therefore, only the necessary arithmetic operations will be recomputed in a Q-DAG 
propagation scheme, leading to a more detailed level of optimization. 

We also stress that the process of evaluating and updating a Q-DAG is done outside of 
probability theory and belief network inference. This makes the development of efficient on- 
line inference software accessible to a larger group of people who may lack strong backgrounds 
in these areas. 4 



2.2 The Availability of Evidence 

The construction of a Q-DAG requires the identification of query and evidence variables. This 
may give an incorrect impression that we must know up front which variables are observed 
and which are not. This could be problematic in (1) applications where one may lose a sensor 
reading, thus changing the status of a variable from being observed to being unobserved; 

4. In fact, it appears that a background in compiler theory may be more relevant to generating an efficient 
evaluator than a background in belief network theory. 
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Figure 6: A belief network and its corresponding Q-DAG in which variable B is declared to 
be both query and evidence. 



and (2) applications where some variable may be expensive to observe, leading to an on-line 
decision on whether to observe it or not (using some value-of-information computation). 

Both of these situations can be dealt with in a Q-DAG framework. First, as we mentioned 
earlier, Q-DAGs allow us to handle missing evidence through the use of the o notation which 
denotes an unknown value of a variable. Therefore, Q-DAGs can handle missing sensor 
readings. Second, a variable can be declared to be both query and evidence. This means 
that we can incorporate evidence about this variable when it is available, and also compute 
the probability distribution of the variable in case evidence is not available. Figure 6 depicts a 
Q-DAG in which variable A is declared to be a query variable, while variable B is declared to 
be both an evidence and a query variable (both variables have true and false as their values) . 
In this case, we have two ESNs for variable B and also two query nodes (see Figure 6). This 
Q-DAG can be used in two ways: 

1. To compute the probability distributions of variables A and B when no evidence is 
available about B. Under this situation, the values of n(B, true) and n(B, false) are 
set to 1, and we have 

Pr(A=true) = .3 * .1 + .3 * .9 = .3 
Pr(A = false) = .8 * .7 + .7 * .2 = .7 
Pr(B = true) = .3 * .1 + .8 * .7 = .59 
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Pr(B= false) = .3 * .9 + .7 * .2 = .41 

2. To compute the probability of variable A when evidence is available about B. For 
example, suppose that we observe B to be false. The value of n(B, true) will then be 
set to and the value of n(B, false) will be set to 1, and we have 

Pr(A= true, B = false) = .3 * .9 = .27 
Pr(A = false, B = false) = .7*. 2 = .14 

The ability to declare a variable as both an evidence and a query variable seems to be 
essential in applications where (1) a decision may need to be made on whether to collect 
evidence about some variable B; and (2) making the decision requires knowing the probability 
distribution of variable B. For example, suppose that we are using the following formula 
(Pearl, 1988, Page 313) to compute the utility of observing variable B: 

Utility _0f .Observing (B) = ^Pr{B = b\e) U(B = b), 

b 

where U(B = b) is the utility for the decision maker of finding that variable B has value b. 
Suppose that U(B = true) = $2.5 and U(B = false) = — $3. We can use the Q-DAG to 
compute the probability distribution of B and use it to evaluate Utility _0f -Observing (B) : 

Utility _0f .Observing (B) = ($2.5 * .59) + (-$3 * .41) = $0.24, 

which leads us to observe variable B. Observing B, we find that its value is false. We can 
then accommodate this evidence into the Q-DAG and continue with our analysis. 

3. Generating Query DAGs 

This section shows how Q-DAGs can be generated using traditional algorithms for exact 
belief-network inference. In particular, we will show how Q-DAGs can be generated using the 
clustering (join tree, Jensen, LS) algorithm (Jensen, Lauritzen, & Olesen, 1990; Shachter, 
Andersen, & Szolovits, 1994; Shenoy & Shafer, 1986), the polytree algorithm, and cutset 
conditioning (Pearl, 1988; Peot & Shachter, 1991). We will also outline properties that must 
be satisfied by other belief network algorithms in order to adapt them for generating Q-DAGs 
as we propose. 

3.1 The Clustering Algorithm 

We provide a sketch of the clustering algorithm in this section. Readers interested in more 
details are referred to (Shachter et al., 1994; Jensen et al., 1990; Shenoy & Shafer, 1986). 
According to the clustering method, we start by: 

1. constructing a join tree of the given belief network; 5 

5. A join tree is a tree of clusters that satisfies the following property: the intersection of any two clusters 
belongs to all clusters on the path connecting them. 
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2. assigning the matrix of each variable in the belief network to some cluster that contains 
the variable's family. 

The join tree is a secondary structure on which the inference algorithm operates. We need 
the following notation to state this algorithm: 

- S\, . . . , S n are the clusters, where each cluster corresponds to a set of variables in the 
original belief network. 

- is the potential function over cluster Si, which is a mapping from instantiations of 
variables in Si into real numbers. 

- Pi is the posterior probability distribution over cluster Si, which is a mapping from 
instantiations of variables in Si into real numbers. 

- Mij is the message sent from cluster Si to cluster Sj, which is a mapping from instan- 
tiations of variables in Si H Sj into real numbers. 

- e is the given evidence, that is, an instantiation of evidence variables E. 

We also assume the standard multiplication and marginalization operations on potentials. 

Our goal now is to compute the potential Pr(X,e) which maps each instantiation x of 
variable X in the belief network into the probability Pr(x,e). Given this notation, we can 
state the algorithm as follows: 

• Potential functions are initialized using 

*i =i[Prx*x, 

x 

where 

— X is a variable whose matrix is assigned to cluster Si; 

— Prx is the matrix for variable X: a mapping from instantiations of the family of 
X into conditional probabilities; and 

— Ax is the likelihood vector for variable X: Xx( x ) is 1 if a; is consistent with given 
evidence e and otherwise. 

• Posterior distributions are computed using 

Pi = *il[M k i, 

k 

where Sk are the clusters adjacent to cluster Si. 

• Messages are computed using 

M a = e n m *«-> 

Si\Sj k±j 

where Sk are the clusters adjacent to cluster Si. 
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• The potential Pr(X,e) is computed using 

Pr(X,e) = P - 
Si\{X} 

where Si is a cluster to which X belongs. 

These equations are used as follows. To compute the probability of a variable, we must 
compute the posterior distribution of a cluster containing the variable. To compute the 
posterior distribution of a cluster, we collect messages from neighboring clusters. A message 
from cluster Si to Sj is computed by collecting messages from all clusters adjacent to Si 
except for Sj. 

This statement of the join tree algorithm is appropriate for situations where the evidence 
is not changing frequently since it involves computing initial potentials each time the evidence 
changes. This is not necessary in general and one can provide more optimized versions of the 
algorithm. This issue, however, is irrelevant in the context of generating Q-DAGs because 
updating probabilities in face of evidence changes will take place at the Q-DAG level, which 
includes its own optimization technique that we discuss later. 

3.2 Generating Q-DAGs 

To generate Q-DAGs using the clustering method, we have to go through two steps. First, 
we have to modify the initialization of potential functions so that the join tree is quantified 
using Q-DAG nodes instead of numeric probabilities. Second, we have to replace numeric 
addition and multiplication in the algorithm by analogous functions that operate on Q-DAG 
nodes. In particular: 

1. Numeric multiplication * is replaced by an operation <g> that takes Q-DAG nodes 
rii, . . .,rii as arguments, constructs and returns a new node n with label * and parents 
ni,.. .,11,. 

2. Numeric addition + is replaced by an operation © that takes Q-DAG nodes ri\, . . .,rii 
as arguments, constructs and returns a new node n with label + and parents n\, . . . , rii. 

Therefore, instead of numeric operations, we have Q-DAG-node constructors. And instead 
of returning a number as a computation result, we now return a Q-DAG node. 

Before we state the Q-DAG clustering algorithm, realize that we now do not have evidence 
e, but instead we have a set of evidence variables E for which we will collect evidence. 
Therefore, the Q-DAG algorithm will not compute an answer to a query Pr(x, e), but instead 
will compute a Q-DAG node that evaluates to Pr(x, e) under the instantiation e of variables 
E. 

In the following equations, potentials are mappings from variable instantiations to Q- 
DAG nodes (instead of numbers). For example, the matrix for variable X will map each 
instantiation of X's family into a Q-DAG node n(p) instead of mapping it into the number 
p. The Q-DAG operations <g> and © are extended to operate on these new potentials in the 
same way that * and + are extended in the clustering algorithm. 

The new set of equations is: 
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• Potential functions are initialized using 

*i = ®n(Pr x ) ®®n(\ E ), 

X E 

where 

— X is a variable whose matrix is assigned to cluster Si] 

— n(Prx) is the Q-DAG matrix for X: a mapping from instantiations of X's family 
into Q-DAG nodes representing conditional probabilities; 

— E is an evidence variable whose matrix is assigned to cluster Si] and 

— u(\e) is the Q-DAG likelihood vector of variable E: n(A#)(e) = n(E,e), which 
means that node n(A#)(e) evaluates to 1 if e is consistent with given evidence 
and otherwise. 

• Posterior distributions are computed using 

k 

where Sk are the clusters adjacent to cluster Si. 

• Messages are computed using 

Si\Sj kfr 

where Sk are the clusters adjacent to cluster Si. 

• The Q-DAG nodes for answering queries of the form Pr(x, e) are computed using 

Qnode(X)= P t , 

s,\{x} 

where Si is a cluster to which X belongs. 

Here Qnode(X) is a potential that maps each instantiation x of variable X into the Q-DAG 
node Qnode(X)(x) which evaluates to Pr(x,e) for any given instantiation e of variables E. 

Hence, the only modifications we made to the clustering algorithm are (a) changing 
the initialization of potential functions and (b) replacing multiplication and addition with 
Q-DAG constructors of multiplication and addition nodes. 

3.3 An Example 

We now show how the proposed Q-DAG algorithm can be used to generate a Q-DAG for 
the belief network in Figure 4(a). 

We have only one evidence variable in this example, C . And we are interested in gener- 
ating a Q-DAG for answering queries about variable B, that is, queries of the form Pr(b, e). 
Figure 7(a) shows the join tree for the belief network in Figure 4(a) , where the tables contain 
the potential functions needed for the probabilistic clustering algorithm. Figure 7(b) shows 
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Figure 7: A join tree quantified with numbers (a), and with Q-DAG nodes (b). 



the join tree again, but the tables contain the potential functions needed by the Q-DAG 
clustering algorithm. Note that the tables are filled with Q-DAGs instead of numbers. 

We now apply the Q-DAG algorithm. To compute the Q-DAG nodes that will evaluate 
to Pr(b,e), we must compute the posterior distribution P 2 over cluster 5*2 since this is a 
cluster to which variable B belongs. We can then sum the distribution over variable A to 
obtain what we want. To compute the distribution Pi we must first compute the message 
M\2 from cluster S\ to cluster 5*2. 

The message M12 is computed by summing the potential function ^i of cluster S\ over 
all possible values of variable C, i.e., Mi 2 = ^J) 1 ^!' which leads to: 

c 

M 12 (A=ON) = [n(.9) ® n(C, ON)] © [n(.l) ® n(C, OFF)], 

M 12 (A=OFF) = [n(.5) ® ra(C, ON)] © [n(.5) © ra(C, OFF)]. 

The posterior distribution over cluster 6*2, Fjj is computed using P 2 = l I / 2 © M12, which 
leads to 

P 2 {A=ON,B=ON) = n(.075)®[[n(.9)®n(C, OA)]©[n(.l)®n(C, OFF)]] 
P 2 (A=ON 1 B=OFF) = ra(.225)®[[ra(.9)®ra(C, OA)] © [n(.l) ® n(C, OFF)]] 
P 2 (A=OFF,B=ON) = n(.56) ® [[n(.5) ® n(C, OA)] © [n(.5) ® n(C, OFF)]] 
P 2 (A=OFF 1 B=OFF) = n(. 14) © [[ra(.5) © n(C, OA)] © [ra(.5) © n(C, OFF)]]. 

The Q-DAG node Qnode(b) for answering queries of the form Pr(b,e) is computed by 
summing the posterior F2 over variable A, Qnode = Fjj leading to 

S 2 \{B} 

Qnode(B=ON) = [n(.075) ® [[n(.9) ® ra(C, OA)] © [ra(.l) ® n(C, OFF)]]] © 
[n(.56) ® [[n(.5) ® n(C, OA)] © [n(.5) ® n(C, OFF)]]] 
Qnode(B=OFF) = [n(.225) ® [[n(.9) ® ra(C, OA)] © [ra(.l) ® n(C, OFF)]]] © 

[n(.14) ® [[n(.5) ® n(C, OA)] © [n(.5) ® n(C, OFF)]]], 
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which is the Q-DAG depicted in Figure 4(b). Therefore, the result of applying the algorithm 
is two Q-DAG nodes, one will evaluate to Pr(B=ON,e) and the other will evaluate to 
Pr(B=OFF , e) under any instantiation e of evidence variables E. 

3.4 Computational Complexity of Q-DAG Generation 

The computational complexity of the algorithm for generating Q-DAGs is determined by 
the computational complexity of the clustering algorithm. In particular, the proposed al- 
gorithm applies a ©-operation precisely when the clustering algorithm applies an addition- 
operation. Similarly, it applies a ©-operation precisely when the clustering algorithm applies 
a multiplication-operation. Therefore, if we assume that © and © take constant time, then 
both algorithms have the same time complexity. 

Each application of © or © ends up adding a new node to the Q-DAG. And this is the 
only way a new node can be added to the Q-DAG. Moreover, the number of parents of each 
added node is equal to the number of arguments that the corresponding arithmetic operation 
is invoked on in the clustering algorithm. Therefore, the space complexity of a Q-DAG is 
the same as the time complexity of the clustering algorithm. 

In particular, this means that the space complexity of Q-DAGs in terms of the number 
of evidence variables is the same as the time complexity of the clustering algorithm in those 
terms. Moreover, each evidence variable E will add only m evidence-specific nodes to the 
Q-DAG, where m is the number of values that variable E can take. This is important to 
stress because without this complexity guarantee it may be hard to distinguish between the 
proposed approach and a brute-force approach that builds a big table containing all possible 
instantiations of evidence variables together with their corresponding distributions of query 
variables. 

3.5 Other Generation Algorithms 

The polytree algorithm is a special case of the clustering algorithm as shown in (Shachter 
et al., 1994). Therefore, the polytree algorithm can also be modified as suggested above 
to compute Q-DAGs. This also means that cutset conditioning can be easily modified to 
compute Q-DAGs: for each instantiation c of the cutset C, we compute a Q-DAG node for 
Pr(x, c, e) using the polytree algorithm and then take the ©-sum of the resulting nodes. 

Most algorithms for exact inference in belief networks can be adapted to generate Q- 
DAGs. In general, an algorithm must satisfy a key condition to be adaptable for computing 
Q-DAGs as we suggested above. The condition is that the behavior of the algorithm should 
never depend on the specific evidence obtained, but should only depend on the variables 
about which evidence is collected. That is, whether variable E is instantiated to value v\ 
or value vi should not affect the complexity of the algorithm. Only whether variable E is 
instantiated or not should matter. 

Most belief networks algorithms that we are aware of satisfy this property. The reason 
for this seems to be the notion of probabilistic independence on which these algorithms 
are based. Specifically, what is read from the topology of a belief network is a relation 
/(X, Z, Y), stating that variables X and Y are independent given variables Z. That is, 

Pr(x,y | z) = Pr(x | z)Pr(y | z) 
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for all instantiations x,y,z of these variables. It is possible, however, for this not to hold 
for all instantiations of z but only for specific ones. Most standard algorithms we are aware 
of do not take advantage of this instantiation-specific notion of independence. 6 Therefore, 
they cannot attach any computational significance to the specific value to which a variable 
is instantiated. This property of existing algorithms is what makes them easily adaptable to 
the generation of Q-DAGs. 

3.6 Soundness of the Q-DAG Clustering Algorithm 

The soundness of the proposed algorithm is stated below. The proof is given in Appendix A. 

Theorem 1 Suppose that Qnode(X) is a Q-DAG potential generated by the Q-DAG clus- 
tering algorithm for query variable X and evidence variables E. Let e' be an instantiation 
of some variables in E, and let Q-DAG evidence 8 be defined as follows: 



That is, the theorem guarantees that the Q-DAG nodes generated by the algorithm will 
always evaluate to their corresponding probabilities under any partial or full instantiation 
of evidence variables. 

4. Reducing Query DAGs 

This section is focused on reducing Q-DAGs after they have been generated. The main 
motivation behind this reduction is twofold: faster evaluation of Q-DAGs and less space to 
store them. Interestingly enough, we have observed that a few, simple reduction techniques 
tend in certain cases to subsume optimization techniques that have been influential in prac- 
tical implementations of belief-network inference. Therefore, reducing Q-DAGs can be very 
important practically. 

This section is structured as follows. First, we start by discussing four simple reduction 
operations in the form of rewrite rules. We then show examples in which these reductions sub- 
sume two key optimization techniques known as network-pruning and computation-caching. 

4.1 Reductions 

The goal of Q-DAG reduction is to reduce the size of a Q-DAG while maintaining the 
arithmetic expression it represents. In describing the equivalence of arithmetic expressions, 
we define the notion of Q-DAG equivalence: 

Definition 5 Two Q-DAGs are equivalent iff they have the same set of evidence- specific 
nodes and they have the same output for all possible Q-DAG evidence. 

6. Some algorithms for two-level binary networks (BN20 networks), and some versions of the SPI algorithm 
do take advantage of these independences. 




e, if evidence e' 
o, otherwise. 



sets variable E to value e; 



then 



M £ {Qnode(X)(x)) = Pr(x,e'). 
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Figure 8: The four main methods for Q-DAG reduction. 



Figure 8 shows four basic reduction operations that we have experimented with: 

1. Identity elimination: eliminates a numeric node if it is an identity element of its child 
operation node. 

2. Numeric reduction: replaces an operation node with a numeric node if all its parents 
are numeric nodes. 

3. Associative merging: eliminates an operation node using operation associativity. 

4. Commutative merging: eliminates an operation node using operation commutativity. 

These rules can be applied successively and in different order until no more applications are 
possible. 

We have proven that these operations are sound in (Darwiche & Provan, 1995). Based 
on an analysis of network structure and preliminary empirical results, we have observed 
that many factors govern the effectives of these operations. The degree to which reduction 
operations, numeric reduction in particular, can reduce the size of the Q-DAG depends on 
the topology of the given belief network and the set of evidence and query variables. For 
example, if all root nodes are evidence variables of the belief network, and if all leaf nodes 
are query variables, then numeric reduction will lead to little Q-DAG reduction. 

We now focus on numeric reduction, showing how it sometimes subsumes two optimiza- 
tion techniques that have been influential in belief network algorithms. For both optimiza- 
tions, we show examples where an unoptimized algorithm that employs numeric reduction 
yields the same Q-DAG as an optimized algorithm. The major implication is that opti- 
mizations can be done uniformly at the Q-DAG level, freeing the underlying belief network 
algorithms from such implementational complications. 

The following examples assume that we are applying the polytree algorithm to singly- 
connected networks. 
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Figure 9: A simple belief network before pruning (a) and after pruning (b). The light-shaded 
node, A, is a query node, and the dark-shaded node, B, is an evidence node. 
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(a) Original Q-DAG (b) Reduced Q-DAG 



Figure 10: A Q-DAG (a) and its reduction (b). 



4.2 Network Pruning 

Pruning is the process of deleting irrelevant parts of a belief network before invoking infer- 
ence. Consider the network in Figure 9(a) for an example, where B is an evidence variable 
and A is a query variable. One can prune node C from the network, leading to the network 
in Figure 9(b). Any query of the form Pr(a \ b) has the same value with respect to either 
network. It should be clear that working with the smaller network is preferred. In general, 
pruning can lead to dramatic savings since it can reduce a multiply-connected network to a 
singly-connected one. 
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If we generate a Q-DAG for the network in Figure 9(a) using the polytree algorithm, we 
obtain the one in Figure 10(a). This Q-DAG corresponds to the following expression, 

Pr(A=ON,e) = Pr(A=ON)J2 x B{b)Pr{b | A=ON)J2 Pr ( c I h ). 

b c 

If we generate a Q-DAG for the network in Figure 9(b), however, we obtain the one in 
Figure 10(b) which corresponds to the following expression, 

Pr(A=ON,e) = Pr(A=ON)J2 x B{b)Pr{b | A=ON). 

b 

As expected, this Q-DAG is smaller than the Q-DAG in Figure 10(a), and contains a subset 
of the nodes in Figure 10(a). 

The key observation, however, is that the optimized Q-DAG in Figure 10(b) can be 
obtained from the unoptimized one in Figure 10(a) using Q-DAG reduction. In particular, 
the nodes enclosed in dotted lines can be collapsed using numeric reduction into a single 
node with value 1. Identity elimination can then remove the resulting node, leading to the 
optimized Q-DAG in Figure 10(b). 

The more general observation, however, is that prunable nodes contribute identity el- 
ements when computing answers to queries. These contributions appear as Q-DAG nodes 
that evaluate to identity elements under all instantiations of evidence. Such nodes can be 
easily detected and collapsed into these identity elements using numeric reduction. Identity 
elimination can then remove them from the Q-DAG, leading to the same effect as network 
pruning. 7 Whether Q-DAG reduction can replace all possible pruning operations is an open 
question that is outside the scope of this paper. 

4.3 Computation Caching 

Caching computations is another influential technique for optimizing inference in belief net- 
works. To consider an example, suppose that we are applying the polytree algorithm to 
compute Pr(c,b) in the network of Figure 11. Given evidence, say B=ON, the algorithm 
will compute Pr(c, B= ON) by passing the messages shown in Figure 12. If the evidence 
changes to B=OFF , however, an algorithm employing caching will not recompute the mes- 
sage 7Tb (a) (which represents the causal support from A to B (Pearl, 1988)) since the value of 
this message does not depend on the evidence on B. 8 This kind of optimization is typically 

7. Note, however, that Q-DAG reduction will not reduce the computational complexity of generating a Q- 
DAG, although network pruning may. For example, a multiply-connected network may become singly- 
connected after pruning, thereby, reducing the complexity of generating a Q-DAG. But using Q-DAG 
reduction, we still have to generate a Q-DAG by working with a multiply-connected network. 

8. This can be seen by considering the following expression, which is evaluated incrementally by the polytree 
algorithm through its message passes: 

Pr(c, e) = ^2 Pr ( c I h ) M b ) ^2 Pr( - h I °) Pr ( a ) • 

b a S " — ^ 

ir B (a) 

~- ' 

fc(f>) 

It is clear that the subexpression corresponding to the message ttb(o) from A to B is independent of the 
evidence on B. 
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Figure 11: A simple belief network for demonstrating the relationship between Q-DAG re- 
duction and computation caching. The light-shaded node, C, is a query node, 
and the dark-shaded node, B, is an evidence node. 




Figure 12: Message passing when C is queried and B is observed. 



implemented by caching the values of messages and by keeping track of which messages are 
affected by what evidence. 

Now, consider the Q-DAG corresponding to this problem which is shown in Figure 13(a). 
The nodes enclosed in dotted lines correspond to the message from A to B. 9 These nodes do 
not have evidence-specific nodes in their ancestor set and, therefore, can never change values 
due to evidence changes. In fact, numeric reduction will replace each one of these nodes and 
its ancestors with a single node as shown in Figure 13(b). 

In general, if numeric reduction is applied to a Q-DAG, one is guaranteed the following: 
(a) if a Q-DAG node represents a message that does not depend on evidence, that node will 
not be re-evaluated given evidence changes; and (b) numeric reduction will guarantee this 



9. More precisely, they correspond to the expression ^2 a Pr(b | a)Pr(a). 
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(a) Original Q-DAG (b) Reduced Q-DAG 



Figure 13: A Q-DAG (a) and its reduction (b). 

under any Q-DAG evaluation method since it will replace the node and its ancestor set with 
a single root node. 10 



4.4 Optimization in Belief-Network Inference 

Network pruning and computation caching have proven to be very influential in practical 
implementations of belief-network inference. In fact, our own experience has shown that 
these optimizations typically make the difference between a usable and a non-usable belief- 
network system. 

One problem with these optimizations, however, is their algorithm-specific implemen- 
tations although they are based on general principles (e.g., taking advantage of network 
topology). Another problem is that they can make elegant algorithms complicated and hard 
to understand. Moreover, these optimizations are often hard to define succinctly, and hence 
are not well documented within the community. 

In contrast, belief-network inference can be optimized by generating Q-DAGs using un- 
optimized inference algorithms, and then optimizing the generated Q-DAG through reduc- 
tion techniques. We have shown some examples of this earlier with respect to pruning and 
caching optimizations. However, whether this alternate approach to optimization is always 
feasible is yet to be known. A positive answer will clearly provide an algorithm-independent 

10. Note that Q-DAGs lead to a very refined caching mechanism if the Q-DAG evaluator (1) caches the value 
of each Q-DAG node and (2) updates these cached values only when there is need to (that is, when the 
value of a parent node changes). Such a refined mechanism allows caching the values of messages that 
depend on evidence as well. 
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Figure 14: A simple belief network for car diagnosis. 



approach to optimizing belief-network inference, which is practically important for at least 
two reasons. First, Q-DAG reduction techniques seem to be much simpler to understand 
and implement since they deal with graphically represented arithmetic expressions, without 
having to invoke probability or belief network theory. Second, reduction operations are ap- 
plicable to Q-DAGs generated by any belief-network algorithm. Therefore, an optimization 
approach based on Q-DAG reduction would be more systematic and accessible to a bigger 
class of developers. 

5. A Diagnosis Example 

This section contains a comprehensive example illustrating the application of the Q-DAG 
framework to diagnostic reasoning. 

Consider the car troubleshooting example depicted in Figure 14. For this simple case 
we want to determine the probability distribution for the fault node, given evidence on four 
sensors: the battery-, alternator-, fuel- and oil-sensors. Each sensor provides information 
about its corresponding system. The fault node defines five possible faults: normal, clogged- 
fuel-injector, dead-battery, short-circuit, and broken-fuel-pump. 

If we denote the fault variable by F, and sensor variables by E, then we want to build 
a system that can compute the probability Pr(f,e), for each fault / and any evidence e. 
These probabilities represent an unnormalized probability distribution over the fault variable 
given sensor readings. In a Q-DAG framework, realizing this diagnostic system involves three 
steps: Q-DAG generation, reduction, and evaluation. The first two steps are accomplished 
off-line, while the final step is performed on-line. We now discuss each one of the steps in 
more detail. 

5.1 Q-DAG Generation 

The first step is to generate the Q-DAG. This is accomplished by applying the Q-DAG 
clustering algorithm with the fault as a query variable and the sensors as evidence vari- 
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Figure 15: A partial Q-DAG for the car example, displaying two of the five query nodes, 
brokenJueLpump and normal. The shaded regions are portions of the Q-DAG 
that are shared by multiple query nodes; the values of these nodes are relevant 
to the value of more than one query node. 



ables. The resulting Q-DAG has five query nodes, Qnode(F = normal, e), Qnode(F = 
clogged JueUnjector,e), Qnode(F = dead_battery,e), Qnode(F = short .circuit , e) , and 
Qnode(F = broken Ju el _pump, e). Each node evaluates to the probability of the correspond- 
ing fault under any instantiation of evidence. The probabilities constitute a differential 
diagnosis that tells us which fault is most probable given certain sensor values. 

Figure 15 shows a stylized description of the Q-DAG restricted to two of the five query 
nodes, corresponding to Pr(F = broken Ju el _pump, e) and Pr(F = normal, e). The Q-DAG 
structure is symmetric for each fault value and sensor. 

Given that the Q-DAG is symmetric for these possible faults, for clarity of exposition 
we look at just the subset needed to evaluate node Pr(F = broken Ju el _pump, e). Figure 16 
shows a stylized version of the Q-DAG produced for this node. Following are some obser- 
vations about this Q-DAG. First, there is an evidence-specific node for every instantiation 
of sensor variables, corresponding to all forms of sensor measurements possible. Second, all 
other roots of the Q-DAG are probabilities. Third, one of the five parents of the query node 
Pr(F = broken Juel_pump , e) is for the prior on F = broken Ju el _pump, and the other four 
are for the contributions of the four sensors. For example, Figure 16 highlights (in dots) that 
part of the Q-DAG for computing the contribution of the battery sensor. 

5.2 Q-DAG Reduction 

After generating a Q-DAG, one proceeds by reducing it using graph rewrite rules. Figure 16 
shows an example of such reduction with a Q-DAG that is restricted to one query node 
for simplicity. To give an idea of the kind of reduction that has been applied, consider the 
partial Q-DAG enclosed by dots in this figure. Figure 17 compares this reduced Q-DAG with 
the unreduced one from which it was generated. Given our goal of generating Q-DAGs that 
(a) can be evaluated as efficiently as possible and (b) require minimal space to store, it is 
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Figure 16: A partial Q-DAG for the car example. 
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Figure 17: Reduced and unreduced Q-DAGs for the car diagnosis example. 



important to see, even in a simple example, how Q-DAG reduction can make a big difference 
in their size. 
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5.3 Q-DAG Evaluation 

Now that we have a reduced Q-DAG, we can use it to compute answers to diagnostic queries. 
This section presents examples of this evaluation with respect to the generated Q-DAG. 

Suppose that we obtain the readings dead, normal, ok and full for the battery, oil, 
alternator and fuel sensors, respectively. And let us compute the probability distribution 
over the fault variable. This obtained evidence is formalized as follows: 

- £ (battery sensor) = dead, 

- £(oilsensor) = normal, 

- £ (alternator sensor) = ok, 

- £ (fuel s ens or) = full. 

Evidence-specific nodes can now be evaluated according to Definition 3. For example, we 
have 

Ai s\n(battery sensor , charged)] = 0, 

and 

Ai s\n(battery sensor , dead)] = 1. 

The evaluation of evidence-specific nodes is shown pictorially in Figure 18(a). Definition 3 
can then be used to evaluate the remaining nodes: once the values of a node's parents 
are known, the value of that node can be determined. Figure 18(b) depicts the results of 
evaluating other nodes. The result of interest here is the probability 0.00434 assigned to the 
query node Pr (fault = broken Ju el _pump,e) . 

Suppose now that evidence has changed so that the value of fuel sensor is empty instead 
of full. To update the probability assigned to node Pr (fault = broken Ju el _pump, e), a brute 
force method will re-evaluate the whole Q-DAG. However, if a forward propagation scheme 
is used to implement the node evaluator, then only four nodes need to be re-evaluated in 
Figure 18(b) (those enclosed in circles) instead of thirteen (the total number of nodes). We 
stress this point because this refined updating scheme, which is easy to implement in this 
framework, is much harder to achieve when one attempts to embed it in standard belief- 
network algorithms based on message passing. 

6. Concluding Remarks 

We have introduced a new paradigm for implementing belief-network inference that is ori- 
ented towards real-world, on-line applications. The proposed framework utilizes knowledge 
of query and evidence variables in an application to compile a belief network into an arith- 
metic expression called a Query DAG (Q-DAG). Each node of a Q-DAG represents a numeric 
operation, a number, or a symbol that depends on available evidence. Each leaf node of a 
Q-DAG represents the answer to a network query, that is, the probability of some event of 
interest. Inference on Q-DAGs is linear in their size and amounts to a standard evaluation 
of the arithmetic expressions they represent. 

A most important point to stress about the work reported here is that it is not proposing 
a new algorithm for belief-network inference. What we are proposing is a paradigm for 
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Figure 18: Evaluating the Q-DAG for the car diagnosis example given evidence for sensors. 

The bar in (a) indicates the instantiation of the ESNs. The shaded numbers in 
(b) indicate probability values that are computed by the node evaluator. The 
circled operations on the left-hand-side of (b) are the only ones that need to be 
updated if evidence for the fuel-system sensor is altered, as denoted by the circled 
ESNs. 
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implementing belief-network inference that is orthogonal to standard inference algorithms 
and is engineered to meet the demands of real-world, on-line applications. This class of 
applications is typically demanding for the following reasons: 

1. It typically requires very short response time, i.e., milliseconds. 

2. It requires software to be written in specialized languages, such as ADA, C++, and 
assembly before it can pass certification procedures. 

3. It imposes severe restrictions on the available software and hardware resources in order 
to keep the cost of a "unit" (such as an electromechanical device) as low as possible. 

To address these real-world constraints, we are proposing that one compile a belief network 
into a Q-DAG as shown in Figure 3 on and use a Q-DAG evaluatorfor on-line reasoning. This 
brings down the required memory to that needed for storing a Q-DAG and its evaluator. It 
also brings down the required software to that needed for implementing a Q-DAG evaluator, 
which is very simple as we have seen earlier. 

Our proposed approach still requires a belief-network algorithm to generate a Q-DAG, 
but it makes the efficiency of such an algorithm less of a critical factor. 11 For example, 
we show that some standard optimizations in belief-network inference, such as pruning and 
caching, become less critical in a Q-DAG framework since these optimizations tend to be 
subsumed by simple Q-DAG reduction techniques, such as numeric reduction. 

The work reported in this paper can be extended in at least two ways. First, further Q- 
DAG reduction techniques could be explored, some oriented towards reducing the evaluation 
time of Q-DAGs, others towards minimizing the memory needed to store them. Second, we 
have shown that some optimization techniques that dramatically improve belief-network 
algorithms may become irrelevant to the size of Q-DAGs if Q-DAG reduction is employed. 
Further investigation is needed to prove formal results and guarantees on the effectiveness 
of Q-DAG reduction. 

We close this section by noting that the framework we proposed is also applicable to 
order-of-magnitude (OMP) belief networks, where multiplication and addition get replaced 
by addition and minimization, respectively (Goldszmidt, 1992; Darwiche & Goldszmidt, 
1994). The OMP Q-DAG evaluator, however, is much more efficient than its probabilistic 
counterpart since one may evaluate a minimization node without having to evaluate all its 
parents in many cases. This can make considerable difference in the performance of a Q-DAG 
evaluator. 
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11. We have shown how clustering and conditioning algorithms can be used for Q-DAG generation, but other 
algorithms such as SPI (Li & D'Ambrosio, 1994; Shachter et al., 1990) can be used as well. 
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Appendix A. Proof of Theorem 1 

Without loss of generality, we assume in this proof that all variables are declared as evidence 
variables. To prove this soundness theorem, all we need to show is that each Q-DAG po- 
tential will evaluate to its corresponding probabilistic potential under all possible evidence. 
Formally, for any cluster S and variables X , the matrices of which are assigned to S, we 
need to show that 

M £ (§Z)n(Pr x )®n(\ x )) = Y[Pr x \ x (1) 

x x 

for a given evidence E. Once we establish this, we are guaranteed that Qnode(X)(x) will 
evaluate to the probability Pr(x, e) because the application of <g> and © in the Q-DAG algo- 
rithm is isomorphic to the application of * and + in the probabilistic algorithm, respectively. 

To prove Equation 1, we will extend the Q-DAG node evaluator Ai £ to mappings in the 
standard way. That is, if / is a mapping from instantiations to Q-DAG nodes, then Ai £ {f) 
is defined as follows: 

M £ (f)(x)= def M £ (f(x)). 

That is, we simply apply the Q-DAG node evaluator to the range of mapping /. 
Note that M £ (f <S> g) will then be equal to M £ {f)M £ {g). Therefore, 

M £ {§Z)n{Pr x )®n{\ x )) 
x 

= \{M £ {n{Pr x ))M £ {n{\ x )) 

x 

= Y\_ P r xM. £ (n(\ x )) by definition of n(Pr x ). 

x 

Note also that by definition of n(X x ), we have that n(X x )(x) equals n(X, x). Therefore, 

M £ {n{\ x )){x) = M £ {n{\ x ){x)) 

= M £ (n(X,x)) 

1, if E(X) = x or E(X) = o 
otherwise 




Therefore, 

M £ {§Z)n{Pr x )®n{\ x )) = YlPr x \ x . 



x x 
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